Fault tolerance and configurability in DSM coherence protocols

نویسندگان

  • Brett D. Fleisch
  • Heiko Michel
  • Sachin K. Shah
  • Oliver E. Theel
چکیده

With the advent of large networks and the demand to have uninterrupted service, computer systems need to be more robust and fault tolerant. There are numerous ways to implement fault tolerance and recovery. A central concept in all these methods is the requirement for replicated data for high data availability. We believe that a protocol must not only provide replication, but do so at low operation overhead. Further, the protocol must provide configurable mechanisms for varying the level of replication, so that the system may be operated at the desired overhead cost. We have developed several Distributed Shared Memory (DSM) protocols and use these with a program-driven simulation to examine the robustness, fault tolerance, and configurability of these. Our investigation compares the Write-Invalidate, Write-Invalidate with Downgrading, Write-Broadcast and several instances of the Boundary-Restricted coherence protocol class. The DSM application suite contains programs representative of various memory-access patterns and behaviors. This paper examines the performance of these protocols under different workloads and analyzes the operation costs, fault tolerance, and configurability of each.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Tolerance and Configurability in DSM Coherence

tolerance. To address these aspects, the DSM coherence protocol must offer increased redundancy, decreased reliance on centralized data and control, support for servicing requests locally, and control over the degree of data availability on a per-data-unit basis. In a page-based DSM system, as assumed in this article, the unit of interest is a DSM page. Object-based systems can use the same pro...

متن کامل

Design and Analysis of Highly Availbalbe and Scalable Coherence Protocols for Distributed Shared Memory Systems Using Stochastic Modeling

Larger size networks require DSM coherence protocols which scale well. Fault-tolerance in terms of high availability is required for data access and for uninterrupted DSM service since large-scale environments have a greater number of potentially malfunctioning components. We present a new class of coherence protocols for DSM systems whose instances o er highly available access to shared data a...

متن کامل

Flexible Fault Tolerance in Configurable Middleware for Embedded Systems

MicroQoSCORBA (MQC) is a middleware platform that focuses on embedded applications by providing a very fine level of configurability of its internal orthogonal components. Using this configurability, a developer can generate a customized middleware instantiation that is tailored to both the requirements and constraints of a specific embedded application and the embedded hardware. One of the key...

متن کامل

Fast and Low Cost Recovery Techniques for Distributed Shared Memory

The goal of this paper is to indicate how the mechanisms already available in standard Distributed Shared Memory (DSM) systems can be efficiently used to reduce the cost of fault-tolerance. It can be achieved by using DSM replication mechanism for recovery and integration of both recovery and memory coherence protocols. We analyze recently developed techniques of recovery for DSM systems which ...

متن کامل

A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability

Large-scale distributed systems are very attractive for the execution of parallel applications requiring a huge computing power. However, their high probability of site failure is unacceptable, especially for long time running applications. In this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM). Although most recover...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Concurrency

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2000